Skip to content

fix(jit): add driver-only mode support for LaunchConfig#48

Merged
m96-chan merged 3 commits intomainfrom
fix/driver-only-build
Dec 14, 2025
Merged

fix(jit): add driver-only mode support for LaunchConfig#48
m96-chan merged 3 commits intomainfrom
fix/driver-only-build

Conversation

@m96-chan
Copy link
Copy Markdown
Owner

Summary

Fix the test-driver-only-windows CI job failure by adding driver-only mode support to native/jit/kernel.hpp.

Problem

The CI job failed with errors like:

error C3646: 'grid': unknown override specifier
error C2061: syntax error: identifier 'dim3'

This occurred because dim3 and cudaStream_t are CUDA Runtime types, but driver-only mode (PYGPUKIT_DRIVER_ONLY=ON) only includes the CUDA Driver API headers.

Solution

  • Add a portable Dim3 struct that's used in driver-only mode
  • Use StreamHandle from stream.hpp instead of cudaStream_t
  • Follow the existing pattern used in stream.hpp
#ifdef PYGPUKIT_DRIVER_ONLY
struct Dim3 {
    unsigned int x, y, z;
    Dim3(unsigned int x_ = 1, unsigned int y_ = 1, unsigned int z_ = 1)
        : x(x_), y(y_), z(z_) {}
};
#else
#include <cuda_runtime.h>
using Dim3 = dim3;
#endif

Test plan

  • Normal build succeeds locally
  • CI test-driver-only-windows job passes

🤖 Generated with Claude Code

m96-chan and others added 3 commits December 14, 2025 15:08
Replace CUDA Runtime types with portable alternatives:
- Add Dim3 struct for driver-only mode (replaces dim3)
- Use StreamHandle from stream.hpp (replaces cudaStream_t)

This fixes the test-driver-only-windows CI job which failed because
dim3 and cudaStream_t are CUDA Runtime types not available when
building with PYGPUKIT_DRIVER_ONLY=ON.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove false claim about "no mandatory external SDKs"
- Add note: CUDA drivers and NVRTC are currently required
- Clarify PyGPUkit is NOT a PyTorch/CuPy replacement
- Update "Lightweight" feature description

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replaced 37-line checklist with compact table for v0.1-v0.2.3.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@m96-chan
Copy link
Copy Markdown
Owner Author

Summary of Changes

This PR now includes three fixes:

1. Driver-Only Mode Support (kernel.hpp)

  • Added portable Dim3 struct for PYGPUKIT_DRIVER_ONLY builds
  • Changed cudaStream_t to StreamHandle (already defined in stream.hpp)
  • Verified: Local driver-only build succeeds (sm_80/86)

2. README False Claims Fix

  • Removed misleading "no mandatory external SDKs" claim
  • Added explicit note: "PyGPUkit currently requires CUDA drivers and NVRTC"
  • Clarified: "NOT a PyTorch/CuPy replacement"

3. Condensed Roadmap

  • Replaced 37-line checklist with compact table for v0.1-v0.2.3
  • Keeps future versions (v0.2.4+) as detailed checklists
| Version   | Highlights |
|-----------|------------|
| v0.1      | GPUArray, NVRTC JIT, add/mul/matmul, wheels |
| v0.2.0    | Rust scheduler, memory pool, kernel cache, 106 tests |
| v0.2.1    | API stabilization, error propagation |
| v0.2.2    | Ampere SGEMM (cp.async, float4), 18 TFLOPS FP32 |
| v0.2.3    | TF32 TensorCore (PTX mma.sync), 27.5 TFLOPS |

@m96-chan m96-chan merged commit 49f2ba2 into main Dec 14, 2025
12 checks passed
@m96-chan m96-chan deleted the fix/driver-only-build branch December 26, 2025 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant